Skip to content

fix: Fix failing train tests for v3#5814

Merged
mohamedzeidan2021 merged 5 commits intoaws:masterfrom
mohamedzeidan2021:master
May 1, 2026
Merged

fix: Fix failing train tests for v3#5814
mohamedzeidan2021 merged 5 commits intoaws:masterfrom
mohamedzeidan2021:master

Conversation

@mohamedzeidan2021
Copy link
Copy Markdown
Collaborator

@mohamedzeidan2021 mohamedzeidan2021 commented Apr 30, 2026

Fix instance type for JumpStart training integration test

test_jumpstart_train[huggingface-spc-bert-base-cased] was failing with ValueError: Training is not supported for model ID with instance type: ml.g5.xlarge. The model's SupportedTrainingInstanceTypes only includes ml.g4dn.* and ml.p3.* variants. Replaced ml.g5.xlarge with ml.g4dn.xlarge.

Note on test_base_model_false_still_works failure:

The other failing test (TestLLMAsJudgeBaseModelFix::test_base_model_false_still_works) is a pre-existing flaky test unrelated to both this change and the release commits. None of the release changes touch the evaluation pipeline code path:

"Make _PipelineExecution a public class" affects the pipeline execution class visibility, not the evaluation pipeline template rendering or _get_or_create_pipeline logic.
"Add CodeArtifact support for ModelTrainer" and "Wire FrameworkProcessor code_location" affect training source code/dependency installation, not the evaluator module.
The S3 bucket/path fixes are in core S3 utilities, not in the evaluation pipeline template selection.
The git_utils and service-2.json changes are unrelated to evaluation.
The test fails due to a race condition in the test itself: pytest-xdist runs test_base_model_evaluation_uses_correct_weights (evaluate_base_model=True) and test_base_model_false_still_works (evaluate_base_model=False) in parallel. Both call _get_or_create_pipeline with the same pipeline name prefix, so one test's pipeline.update() overwrites the other's pipeline definition before execution starts. This is a shared-resource concurrency issue in the test infrastructure that predates these release changes. A separate fix (e.g., marking the class @pytest.mark.serial) is needed to address it.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mujtaba1747
mujtaba1747 previously approved these changes May 1, 2026
}


@pytest.mark.serial
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test passes if run serially, this means there maybe a race condition. We can fix it later post-release.

@mohamedzeidan2021
Copy link
Copy Markdown
Collaborator Author

mohamedzeidan2021 commented May 1, 2026

Screenshot 2026-05-01 at 10 28 26 AM

all integ tests passed

i accidently pushed a commit, but removed it so the checks restarted. I took this screenshot before though

@mollyheamazon
Copy link
Copy Markdown
Contributor

@mohamedzeidan2021 mohamedzeidan2021 merged commit 1885e4c into aws:master May 1, 2026
73 of 108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants